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ABSTRACT 

We present a new algorithm for generating merger trees and halo catalogs which explicitly ensures consis- 
tency of halo properties (mass, position, velocity, radius) across timesteps. Our algorithm has demonstrated 
the ability to improve both the completeness (through detecting and inserting otherwise missing halos) and 
purity (through detecting and removing spurious objects) of both merger trees and halo catalogs. In addition, 
our method is able to robustly measure the self-consistency of halo finders; it is the first to directly measure 
the uncertainties in halo positions, halo velocities, and the halo mass function for a given halo finder based on 
consistency between snapshots in cosmological simulations. We use this algorithm to generate merger trees 
for two large simulations (Bolshoi and Consuelo) and evaluate two halo finders (ROCKSTAR and BDM). We 
find that both the ROCKSTAR and BDM halo finders track halos extremely well; in both, the number of halos 
which do not have physically consistent prog enitors is at the 1-2% level across al l halo masses. Our code is 
publicly available at http : / / code . google . com/p/cons istent-trees[ Our trees and catalogs will 
made publicly available once the referee process is complete. 

Subject headings: dark matter — galaxies: abundances — galaxies: evolution — methods: N-body simulations 



1. EVITRODUCTION 

Over the past few decades, dark matter simulations have 
demonstrated increasing usefulness for validating theories of 
cosmology, for understanding systematic biases in observa- 
tions, and for constraining galaxy and large-scale structure 
formation. In coming years, the rapid expansion of obser- 
vational data coming from ground and space based surveys, 
including CANCELS, GAMA, BOSS, DES, Herschel, Pan- 
STARRS, BigBOSS, eROSITA, Planck, JWST, and LSST 
will mean that simulations will become even more important 
for modeling and understanding the detailed evolution of the 
cosmos. This wealth of data means that cosmological and 
galaxy properties soon will be measured to a new standard 
of precision; however, none of this will increase the accuracy 
of current cosmological constraints without a concordant in- 
crease in the quality of simulations and our ability to model 
the systematic biases inherent in the observations. 

Two of the principal outputs of dark matter simulations are 
halo catalogs and merger trees; namely, information about the 
deep potential wells where galaxies are expected to reside, 
and a history of the mergers and growth of these potential 
wells. Derived properties of these outputs, such as the halo 
mass function and auto-correlation function, must be under- 
stood at the one-percent and five-percent level, respectively, in 
order to use the f ull constraining power of future surveys for, 
e.g., dark energy ( |Wu et al.|2010| l. Similar levels of accuracy 



are required to be able to distinguish between differ ent values 
of the prim ordial non-gaussianity parameter /nl ( [Pillepich] 
|etal.|2010| ). In addition to raw accuracy, models of galaxy for- 
mation (e7g., semi-empirical abundance matching and semi- 
analytical models) depend on the physical consistency of cata- 
logs and merger trees, in the sense that they require physically 
reasonable halo growth histories and dynamically-plausible 
mergers to accurately model the build-up of galaxy properties 
(e.g., stellar mass, luminosity, metallicity, dust) over time. 

To date, while simulations largely agree on the final dark 
matter distribution, few comprehensive reviews have been 
performed to determine which combinations of halo finders 
and merge r tree codes produce the most accurate results (see, 
however, Knebe et al. 2011 1. In part, this is because cos- 



mological halosTiavecompricated structure; depending on 
the particle distribution, it may be difficult to tell which halo 
finder is "better" or "worse" except by using a tedious and 
subjective examination by hand. On the other hand, with 
comparisons performed on more clinical test cases, such as 
halos generated with perfect NEW profiles, it is difficult to 
know how the comparison results will translate to the messier 
world of cosmological halos. Furthermore, percent-level un- 
derstanding of the halo mass function requires not only that 
the halo finder should function well in the common cases, but 
that it should also be robust against even the most extremely 
misshapen halos. 
This paper avoids the problem of defining accuracy on a 
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cosmological simulation, and instead seeks to provide some 
clarity on this issue with a different approach. As noted ear- 
lier, a necessary (although not sufficient) precondition for ac- 
curacy is physical consistency. In most simulations, halo 
properties are expected to evolve slowly relative to the rate 
at which simulation timesteps are saved. As such, by com- 
paring halos across several timesteps, it becomes possible to 
analyze not only which halo finders most consistently deter- 
mine halo properties, but also, which halo finders result in the 
most reasonable evolution of halos, based on, e.g., their mass 
accretion, positions, and velocities. 

Our choice for the halo properties to compare across 
timesteps (i.e., halo mass, circular velocity, position, and ve- 
locity) implies that we must calculate the gravitational evo- 
lution of halos (as distinct from particles) across timesteps. 
This approach requires somewhat more effort than simpler 
approaches, but it nonetheless has several unique advantages. 
The most obvious one is that we can do extended tests on the 
physical consistency of halos, and thus, we may make quan- 
tifiable estimates of, e.g., the current accuracy of halo mass 
functions. Just as usefully, the approach allows us to repair 
halo catalogs and merger trees when inconsistencies are found 
— e.g., when a halo disappears for a few timesteps in the halo 
catalogs, we can regenerate its expected properties by gravita- 
tional evolution from the surrounding timesteps. Here, we use 
this approach to generate two sets of halo catalogs and merger 
trees for two large simulations (Bolshoi and Consuelo), which 
use different simulation codes (ART and GADGET-2, respec- 
tively) from halos found using the ROCKSTAR halo finder 

f Behroozi et al..2011 ). In addition, we g enerate merger trees 
or B olshoi using the BDM halo finder (Klypin et al. 1999 
201 1\ , which allows us to compare the consistency of differ 
ent halo finders on the same simulation. 

This paper is divided into sections as follows. First, we dis- 
cuss limitations of merger trees and halo finding in the litera- 
ture which can cause inconsistencies in ^ Next, we present 
details for the two main simulations (Bolshoi and Consuelo) 
and the two halo finders (ROCKSTAR and BDM) for which we 
calculate merger trees in ^ Then, we describe the meth- 
ods we use to predict halo locations and velocities across 
timesteps (i.e., the gravitational evolution methods) in ^ We 
discuss the metrics and methods for repairing merger trees 
and halo catalogs in ^and present quantitative tests for these 
methods in ^ Finally, we summarize our conclusions in g7 

2. CURRENT METHODS AND CURRENT ISSUES WITH MERGER 

TREES 

2.1. A Brief Overview of Current Methods 

Traditionally, merger trees have been generated by track- 
ing particles in identified halos from one timestep to another. 
In most approaches, a halo at one timestep (a progenitor) is 
linked to a halo at the next timestep (the descendant) if the 
majority of the particles in the progenito r end up in the de- 
scendant (e.g., 'Planelles & Quilis"20101 'Zhao et al. 1 12009 



ods range from ensuring that the most -bound particle is lo 
cated within the descend ant halo (e.g., van den Bosch||2002 



'Kauffmann et al. 1999 1, to creating merger trees based on 



Fakhouri & Ma 2008; Li et al. 2007; Nagashima et al. 2003 



Helly et al. 2003; H atton et al.i|2003nWechsler et al.||2002 
Tormen 1998; Roukema et al .|1997[|Lacey & Cole 



Generally, enhancements to this basic method have as- 
signed more weight to the trajectories of the most-bound par- 
ticles in each halo; this implies better continuity for the galaxy 
at the center of the dark matter potential well. Such meth- 

' See s3|for details on our cosmology assumptions, which are different 
between the two simulations in this paper. 



the trajectori es of a fraction of the most-bound particles (e.g .. 
Cole et al. 2008; Barker et al. 2006 ; Okamoto & Habe 2000!), 
and more complicate d continuous wei ghting me trics (e.g., 
Fakhouri et al. 2010; De Lucia & Blaizot 2007 ; Springel et al. 
|2005j|. For specialized purposes, such as calculating smooth 
accretion vs. substructure accretion, some studies have exam- 
ined splitting halos such that all the parti cles in each result- 
ing group end up in the same descendant (IGenel et al. 120101 

m9\ . — 



In some more recent implementa tions ( Sprin gel et al.l2005[ 
Allgood 2005 ; Allgood et al. 20^ |Wetzel"er al. 2009 , Wetzel] 
|& White|2010i|Klypin et al.i201 1^ attempts have been made to 
extend progenitor identification beyond particle-based merger 
trees to ensure the halo continuity; however many of these at- 
tempts have focused on robust substructure tracking using a 
subset of halo properties. Here we extend this approach to 
consider a wide range of properties for both halos and subha- 
los, including halo mass, maximum circular velocity (v,nax), 
position, and bulk velocity. As halo finders have known 
imperfections — e.g., problems finding halos near the resolu- 
tion limit and problems resolving substructure near the cen- 
ters of host halos — these imperfections translate into prob- 
lems with accretion histories in purely particle-based merger 
trees. Just as importantly, halo finders can have unknown im- 
perfections which particle-based merger trees cannot revealj^ 
Hence, it is the desire to fix both the known and the unknown 
problems with merger trees which has motivated us to attempt 
more advanced methods for tree construction. We present 
a sampling of a few known problems (which affect all halo 
finders) in the following section ( ^2.2| l. 

2.2. Consistency Problems 

Particle-based halo finders and merger tree algorithms have 
been very successful in matching galaxy populations from 
very high redshifts (z ^ 8) to the present day for massive clus- 
ters to dwarf sat ellites (some recent examples of the relevant 
simulations are Klypi n etal. 201 IjlCrocce et al.l2010UBoylan-| 
Kolchin et al. 2009 ; Dlema^Jt al. 2008 ; Springel et al. 200^ 
|2005). At the same time, there are several instances in which 
problems can appear that do not satisfy future requirements 
for high-precision halo catalogs and trees: 

1 . A satellite halo may not be identified during a timestep 
where its orbit passes close to the center of a larger halo. 
As a result, in the merger tree, it would be classified as 
merged with the larger halo (which would receive most 
of its particles), but then at the next timestep, the satel- 
lite would be identified as a new halo with no progeni- 
tors. 

2. In the less extreme case where the halo finder identifies 
a satellite halo close to the center of a larger halo but 
where many of the particles are mis -identified as being 
associated with the central halo, there would be a sim- 
ilar result: the merger tree would record a false merger 
and the sudden appearance of a new halo with no pro- 
genitors. Approaches which track only the most-bound 

^ Indeed, several previously unknown halo finding issues with both BDM 
and ROCKSTAR were identified and fixed as a result of developing this algo- 
rithm. 
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particles would result in fewer such cases in the merger 
tree, but the halo properties (e.g., halo mass, v^ax, etc.) 
would remain incorrect for those timesteps. 

3. For halos whose particles are distributed among many 
other halos at the next timestep, the fate of the halo's 
galaxy is not clear — depending on which particles end 
up in other halos, the galaxy could either be disrupted 
into the ICL, or it could end up in one of the halos 
which received the most-bound particles. This has par- 
tial overlap with case (1), as two merging halos may 
be mis-identified as a single halo during a close ap- 
proach and then be identified as multiple halos at the 
next timestep. 

4. The opposite effect may also happen — a subhalo pass- 
ing close to the center of a larger halo may erroneously 
be assigned particles from the central halo, leading ei- 
ther to a spuriously large subhalo or a duplicate of the 
central halo. 

5. Halos just on the threshold of identification (e.g., be- 
cause of low particle numbers) may appear and disap- 
pear several times over successive timesteps, leading to 
false mergers, halos with no descendants, or simply a 
bias against low-mass halos, depending on the merger 
tree implementation. 

Each of these problems leads to a systematic bias in certain 
kinds of galaxies, which would then affect precision compar- 
isons with observations. Issues with satellite halos would re- 
duce the number of close galaxy pairs predicted by the simu- 
lation. Issues with halos close to the resolution limit and halos 
undergoing major mergers would result in systematic under- 
counting of the halo mass function. This is particularly im- 
portant at the massive end, where mergers are common to the 
present data and accurate constraints are required for preci- 
sion cosmology. Finally, issues with halos not having correct 
progenitor tracks would result in either incorrect modeling (in 
terms of semi-analytic galaxy models) or incorrect matching 
(in terms of the abundance matching approach) of galaxies 
in halos, making it more difficult to compare galaxy catalogs 
between simulations and observations. Many of these prob- 
lems have been addressed to varying degrees in the literature 
previously, in particular those issues relating to robust track - 
ing of substructure, see for example Wechsler et al.' (! 2002[); 
[Springel (2005); Faltenbacher et al. 
12006 



(2009 



1 2005); Allgoo d et al. 
; |Harkeretal.| (2006); Wetzel et al. (2009); Twe ed et al. 



. Here we attempt to address all of these issues system- 
atically and simultaneously. 

3. THE SIMULATIONS AND HALO FINDERS 

In this paper, we present merger trees for two large ACDM 
dark matter simulations (Bolshoi and Consuelo). These simu- 
lations demonstrate the applicability of our method to two dif- 
ferent simulation codes (ART and GADGET-2, respectively), 
as well as two different halo finders (ROCKSTAR and BDM), 
each with d iffer ent strengths and weaknesses. We describe 
Bo lshoi in ^3.1| Consuelo in §3.2 



the ROCKSTAR algorithm 

in ^3.3| the BDM algorithm in 5 X? and we describe the initial 
particle-based merger trees for Bolshoi and Consuelo in 5 3.5 
We have also condu cted some Umited co mparisons with the 
SUB FIND algorithm ( [Springel et al.||2001| l on a subregion of 
Bolshoi in Appendix C] 



3.1. Bolshoi 

We present merger trees for a new high-resolution simula- 
tion, Bolshoi, described in detail in Klypin et al. (201 1). Bol- 
shoi follows a comoving, periodic box with side length 250 
/i"' Mpc with 2048^ (w 8.6 x lO**) particles from redshift 80 
to the present day. Its exquisite mass resolution (1 .9 x 10^ Mq 
per particle) and force resolution (1 kpc) make it ideal for 
studying intrinsic properties, clustering, and evolution of ha- 
los from 10"^ Mq (e.g., satelHtes of the Milky Way) to the 
largest clusters in the universe (10'^ Mq). Bolshoi was run as 
a collisionless dark matter simulation with the Adaptive Re- 
finement Tre e Code (ART; [Kravtsov et al.|1997[ [Kravtsov &] 
Klypin|1999l ) assuming a flat, ACDM cosniology (Qm = 0.27, 
S 2a = 0.73, h = 0.7, cg = 0.82, and n, = 0.95). These cosmolog- 
ical parameters are con sistent with results from both WMAP5 
(Komatsu e t"aL][2009ll and the latest WMAP7h-BAOh-Ho re- 
sults (Ko matsu et al.|20Tl) . Our merger trees are constructed 
using 180 output snapshots of the simulation, and contain a 
total of nearly 3 billion halos. 

3.2. Consuelo 

We also present merger trees for a second simulation, Con- 
suelo, which is one box out of a suite of 200 taken from the 
Large Suite of Dark Matter Simulations (McBride et al, in 
preparation)]^ Consuelo covers a larger volume (420 Mpc 
on a side) with fewer particles (1400^) making it ideal for 
studies of cosmic variance in high-redshift surveys. The mass 
resolution per particle (2.7 x 10^ Mq) implies a completeness 
limit close to 2 x lO" Mq for halo masses; the reduced force 
resolution (softening length of 8 kpc) as compared to Bol- 
shoi implies a reduced ability to track subhalos within the 
virial radius of a larger halo. Consuelo was run as a collision- 
less dark matter simulation using GADGET-2 (Springel 2005), 
with a flat, ACDM cosmology (Qm = 0.25, Qa = 0.75, h = 0.7, 
(78 = 0.8, and n^ = 1 .0) which is sim ilar to the WMAP5 best- 
fit cosmology ( Komatsu et al. 2009 1. Our default merger trees 
for this simulation are constructed using 100 output snapshots 
and contain a total of approximately 500 million halos in all 
timesteps. We present results from Consuelo largely in Ap- 
pendix |B] to streamline the content of the main body of the 
paper. 

3.3. The ROCKSTAR Halo Finder 

For the main results in this paper, both the Bolshoi and 
Consuelo simulations were analyzed using the ROCKSTAR 
halo finder. The ROCKSTAR algorithm is a newly-developed 
phase-space temporal (7D) halo finder designed for increased 
consistency and accuracy of halo prop erties, especia lly for 
satellites and major mergers (Behroozi et al.|[2011| l. The 
method first divides the simulation volume into 3D friends- 
of-friends groups with a large linking length (b = 0.28) for 
easy parallel analysis. For each group, particle positions and 
velocities are normalized by the group position and veloc- 
ity dispersions, giving a natural phase-space metric. Then, 
the algorithm adaptively chooses a phase-space linking length 
such that 70% of the group's particles are linked together 
into subgroups. This process repeats for each subgroup — 
renormalization, a new linking-length, and a new level of 
substructure calculated — until a full hierarchy of particle sub- 
groups is created. Seed halos are then placed in the dens- 
est subgroups, and particles are assigned hierarchically to the 



' LasDamas Project, http : //Iss . phy . vanderbilt . edu/lasdamas/ 
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closest seed halo in phase space (see Behroozi et al.|20lT for 
full details). Finally, once particles have been assigned to ha- 
los, unbound particles are removed and halo properties (posi- 
tions, velocities, spherical masses, radii, spins, etc.) are calcu- 
lated. If halos at the previous snapshot are available, they are 
used to determine the host halo / satellite halo relationships in 
cases (such as major mergers) where they are ambiguous. 

3.4. The BDM Halo Finder 
The basic technique of the BDM halo finder is described 



Klypin & Holtzman (1997); a more detailed description 
is given in 'Rie be et~al.| (jlOIljl, and tests and comparisons 



with other codes are presented in [Knebe et al. (201 1 1. The 



code uses a spherical 3D overdensity algorithm to identify 
halos and subhalos. It starts by finding the density for each 
individual particle; the density is defined using a top-hat fil- 
ter with a given number of particles A^fiitei , which typically is 
Milter = 20. The code finds all density maxima, and for each 
maximum it finds a sphere containing a given overdensity 
mass Ma = (47r/3)Apcr^^, where pa- is the critical density 
of the Universe and A is the specified overdensity. 

Among all overlapping spheres the code finds the one that 
has the deepest gravitational potential. The density maximum 
corresponding to this sphere is treated as the center of a dis- 
tinct halo. Thus, by construction, a center of a distinct halo 
cannot be inside the radius of another one. However, periph- 
eral regions can still partially overlap, if the distance between 
centers is less than the sum of halo radii. The radius and mass 
of a distinct halo depend on whether the halo overlaps or not 
with other distinct halos. The code takes the largest halo and 
identifies all other distinct halos inside a spherical shell with 
distances R = ( 1 - 2)7?center from the central large halo, where 
^center IS the radius of the largest halo. For each halo selected 
within this shell, the code finds two radii. The first is the dis- 
tance /?big to the surface of the large halo: /?big = ^center- 
The second is the distance Rmax to the nearest density max- 
imum in the shell with the inner radius min(/?big,^A) and 
the outer radius max(/?big,^A) from the center of the selected 
halo. If there are no density maxima within that range, then 
^max = ^A- The radius of the selected halo is the maximum 
of /?big and /?max- Once all halos around the large halo are 
processed, the next largest halo is taken from the list of dis- 
tinct halos and the procedure is applied again. This setup is 
designed to make a smooth transition of properties of small 
halos when they fall into a larger halo and become subhalos. 

The bulk velocity of either a distinct halo or a subhalo is 
defined as the average velocity of the 100 most bound parti- 
cles of that halo or by all particles, if the number of particles 
is less than 100. The number 100 is a compromise between 
the desire to use only the central (sub)halo region for the bulk 
velocity and the noise level. 

The gravitational potential is found by first finding the mass 
in spherical shells and then by integration of the mass profile. 
The binning is done in log radius with a very small bin size of 
Alog(7;) = 0.01. 

Centers of subhalos can only be found among density max- 
ima, but not all density maxima are subhalos. An important 
construct for finding subhalos are barrier points: a subhalo ra- 
dius cannot be larger than the distance to the nearest barrier 
point times a numerical tuning factor called an overshoot fac- 
tor /over ~ 1.1-1.5. The subhalo radius can be smaller than 
this distance. Barrier points are centers of previously identi- 
fied (sub)halos. For the first subhalo, the barrier point is the 
center of the distinct halo. For the second subhalo, it is the 



first barrier point and the center of the first subhalo, and so on. 
The radius of a subhalo is the minimum of (a) the distance to 
the nearest barrier point times /over and (b) the distance to its 
most remote bound particle. 

3.5. Particle-Based Merger Trees 

As part of the algorithm process for generating gravita- 
tionally consistent trees, we computed simple particle-based 
merger trees for both Bolshoi and Consuelo. The algorithm 
assigns a descendant to a halo based on which halo at the 
next timestep receives the largest fraction of the halo's par- 
ticles (excluding substructure). In principle, this method is 
sufficient to correctly predict the vast majority of halo de- 
scendants (although of course it cannot identify cases where a 
halo should never have existed, or where a halo was missed). 
An algorithm based on some fraction of the most-bound parti- 
cles would be superior in cases where a subhalo is undergoing 
rapid stripping (e.g., when it loses more than half of its par- 
ticles to its host); however, the algorithm we present in this 
paper can very effectively correct for such cases. 

Indeed, for the EDM analysis of Bolshoi, only the 250 most- 
bound particles were available; as such, we created initial 
merger trees based on only these 250 particles for each halo. 
The use of incomplete particle information resulted in a small 
fraction (w 0.2-0.5% between timesteps) of halos without 
descendants and a large fraction (10%) of spurious links (see 
§5.31l — especially for large halos and halos undergoing ma- 



jor mergers. By comparison, the fraction of spurious links in 
the ROCKSTAR particle trees was between 1-3%, depending 
on redshift. The initial BDM tree thus represents a worst-case 
scenario for a particle-based merger tree, but it serves as an 
ideal proving ground for the efficacy of our algorithm. 

4. GRAVITATIONAL HALO EVOLUTION EQUATIONS 
4.1. Overview 



To solve the problems identified in 52.2 it is necessary to 



enforce consistency in the halo catalog across timesteps. No 
matter how well-written the halo finder is, there will always 
be halos which (for example) cross the threshold of detection 
in one timestep and then disappear in the next. It is impossi- 
ble to tell whether those cases are statistical fluctuations or not 
based only on the information available at a single timestep — 
otherwise, presumably, appropriate logic could be added to 
the halo finder to account for them. Thus, the presence or ab- 
sence of a halo in adjacent timesteps lends otherwise unavail- 
able evidence which helps determine whether the halo should 
be present in the current timestep. 

Knowing the positions, velocities, and mass profiles of ha- 
los at one timestep, we may use the laws of gravity and inertia 
to predict their properties at adjacent timesteps. By compar- 
ing the predicted halo catalogs with the actual ones, and by 
calculating the deviations from the predictions, we can imme- 
diately tell whether the halo finder has missed or misidentified 
halos. The approach taken herein is straightforward and ef- 
fective when the halo catalogs have already been calculated]^ 
We detail the equations used in this model in the next two sec- 
tions; first for predicting bulk halo motion ( ^4.2| i, and second 
for predicting tidal disruption of halos into a more massive 
host (§4.3 1. Detailed tests of the model accuracy are presented 



in later sections of the paper (§6.1 and §6.2 1 



A future paper may explore integration with halo finders in order to im- 
prove halo identification in situ as the halo catalogs are being generated. 
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4.2. Gravitational Evolution for Predicting Most-Massive 
Progenitors 

In predicting halo motion between successive timesteps, we 
make a few simplifying assumptions. First, we assume that 
the kinematics of dark matter halos are principally affected 
only by the positions and mass profiles of other dark matter 
halos in the simulation. While this assumption breaks down at 
the very highest redshifts, it remains remarkably accurate for 
halos at currently-observable redshifts (out to at least z ^ 10; 
see 5 6. 1 1. To additionally reduce the complexity of our code, 



we appr oximate halo mass d istributions by spherical NFW 
profiles (Navarro et al. 1997| l. Thus, each halo is fully de- 
scribed by a position vector, a velocity vector, a scale radius 
(r,), and a characteristic density (po). Again, while this as- 
sumption is incorrect in detail, it nonetheless remains remark- 
ably accura te fo r tracking halo motion between consecutive 
timesteps (5 6.1 1. 

Using the positions, velocities, and halo profile informa- 
tion for halos at one timestep, we may predict the positions 
and velocities of their most-massive progenitors. We apply 
Newtonian gravity to halos embedded in the standard FLRW 
expanding coordinate system. In the usual formula, the force 
between two halos would bej3 



GM1M2 



'1^2 = 



(1) 



Where r is the distance between halo centers. We approx- 
imate the mass Mi as the dark matter bound to the first halo 
within a radius r of its center. This may be calculated by in- 
tegrating the NFW profile of the first halo out to the desired 
radius: 



M^Fwir, r., , Po) = 47rpo / — ( 1 + — 
lo r \ r. 



r^dr 



= 4TTpor^ 



ln(l + ^ 



(2) 



In addition, we note that subhalos have a steadily decreasing 
influence on the bulk motion of the host as they approach the 
host's center, so we introduce a softening length of a virial 
radius to avoid this problem]^ Thus, the full force equation 
becomes 



^\^2\ = 



GMf^(r,r,j,poA)M, 



vir,2 



(3) 



and the contribution to the acceleration of the second halo is 

GMNFw('",rs,l>PO,l) . 



Ad2-- 



r-^ + r. 



(4) 



vir,2 



We do not find it necessary to introduce additional terms for 
dynamical friction, even on the order of timesteps which are 
300Myr, as it remains a subdominant source of error in terms 
of calculating halo velocities. 

^ Note that as dark matter halos are extended and non-rigid, the notion 
of the "force" between two halos is somewhat ambiguous. In this case, we 
define the force on a halo to be the acceleration of its center times the virial 
mass of the halo so that the expected equation F = ma still holds. 

* We take the virial mass (Myir) and radius (ryir) of a halo to be defined 
in terms of the virial spherical overdensity (Ayj,-) with respect to the mean 
background density as given by |Bryan & Norman|jl998| . I.e., Ayj, = ( 1 Stt^ + 
82.v-39.t2)/(l +^); X = (1 +PA(z)/PM(z)r' - 1 



While the most straightforward way to predict halo loca- 
tions would be to evaluate forces between every pair of ha- 
los, this would require 0(n^) time to compute. This becomes 
prohibitive for today's best simulations (where the number of 
halos is n ^ 10^), so we describe an approach to restrict the 
computation time to roughly C'(nln(n)). 

For a desired accuracy in halo velocities (Av) and a given 
timestep length (Af), we need only calculate accelerations 
(Afl) for nearby halos; namely, those which satisfy Aa > 
Av/Af. To first order, given Eqn. |4] a halo will accelerate 
nearby halos only as a function of the separation distance: 



Aa . 



GMv 



(5) 



We may then define a cutoff distance beyond which we need 
not calculate the gravitational effects of a given halo: 



?"cutoff(Mvir) ■■ 



Av/Af' 



(6) 



as the given halo will not affect the velocities of other halos 
beyond rcutoff by more than the desired velocity accuracy. Us- 
ing binary space partitioning (BSP) trees to efficiently access 
halos by location ^we thus limit the amount of work we have 
to do for each halo to be proportional to the number of halos 
within a distance rcutoff- Profiles of our implementation sug- 
gest that the work required to build and access the BSP trees is 
much larger than the work required to calculate gravitational 
forces, which results in a runtime which scales as 0(nln(n)). 

We find that setting a velocity tolerance of Av = 5 km s"' 
and using leapfrog integration is more than adequate for the 
purpose of tracking halos; this results in a systematic dri ft fo r 
predicted halo positions of at most 5 pc Myr~' (see also 5 6.1 



4.3. Tidal Forces and Mergers 

In the case where a halo is not found near its predicted loca- 
tion, it may either be a halo which briefly fluctuates above the 
threshold for halo detection, or it may be a halo which merges 
into a larger halo at the next time step. Some indication of 
which of these two fates occurred may be obtained via an es- 
timate of the tidal field at the center of the halo in question, 
i.e., the spatial derivative of the acceleration field (see Eq.Hl). 
To leading order, the tidal field exerted by one halo on another 
will bel3 



da 
dr 



oclrl 



GMNFwir,rsA,Poj) 



(7) 



In particular, we find that a simple threshold cut on the 
value of |r| reliably separates halos which coul d rea sonably 
undergo mergers from those that could not (see 56.2 for vali- 
dation). 

5. A GRAVITATIONALLY CONSISTENT METHOD FOR REPAIRING 
HALO CATALOGS AND MERGER TREES 

5.1. Overview 

' BSP trees are a superset of id-trees. In the latter approach, the size 
of the refinement volumes is generally fixed at a given refinement level. In 
our approach, the size of the refinement volumes shrinks to exactly cover the 
enclosed elements, yielding slightly better memory usage and access times. 

The coefficient of proportionality varies from 6-10, under the assumption 
of NFW profiles, and depends only weakly on the distance between the two 
halos. 
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In this section, we assume the existence of halo catalogs for 
every output timestep of a dark matter simulation. Particle- 
based merger trees are required for calibrating the metric for 
calculating likely progenitors; however, as such, they do not 
need to be computed for the entire volume of the box, nor do 
they ne ed to be especially robust. As we demonstrate at the 
end of ^5.3[ using merger trees based on only the 250 most- 
bound particles gives identical final results in our approach as 
using merger trees based on the trajectories of all particles in 
each halo. 

The consistency method that we adopt is by no means the 
only one. Nonetheless, our gravitational method has several 
unique advantages: 

1. Missing halos may be fully reconstructed, with con- 
sistent positions, velocities, and halo properties, along 
with quantifiable error estimates for the reconstruc- 
tions. 

2. Merger tree links are assigned a natural likelihood esti- 
mate. Particle-based links between halos which are too 
far apart in position, velocity, or mass may then be cut 
and reconnected to more likely candidates. 

3. The resolution limit of the simulation — i.e., the parti- 
cle number below which halo properties are no longer 
reliably calculated — ^becomes explicitly quantifiable in 
terms of the errors induced in the position and velocity 
of halos. 

4. The method subjects the general reliability of the halo 
finder to an independent check; different halo finders 
may then be compared on an even footing to evalu- 
ate how self-consistently they recover halo properties 
across multiple timesteps. 

5. The method provides a clean way to distinguish be- 
tween subhalos which are tidally disrupted at the next 
timestep and subhalos which are instead lost by the 
halo finder, which would otherwise have an identical 
particle-based merger tree. 

We first discuss some basic metho dology in terms of linking 
progenitors and descend ants ( |5.2|i an d then discuss the two 
stages of our algorithm (5 5.3 and ^5.4|i. 



5.2. Linking Progenitors and Descendants 

Underlying our approach for repairing merger trees is the 
observation that in ACDM, halos do not spontaneously ap- 
pear with large masses; instead, they are built up from smooth 
accretion and mergers of smaller halos. This implies that ev- 
ery halo has at least one progenitor at the previous timestep 
(although the mass of the progenitor may be too small for the 
halo finder to recover it). Hence, if we trace a halo backwards 
to its expected location at the previous timestep but do not 
find a progenitor there, we may conclude that the halo catalog 
is incomplete. If we find a match in the halo catalogs after 
tracing the halo backwards for a few more timesteps, we may 
interpolate the intrinsic halo properties (e.g., halo mass, v,nax, 
etc.) between the timesteps and assign positions and veloci- 
ties based o n the best estimates of the gravitational evolution 
algorithm ( ^4.2| i. However, if we do not find a good match, 
we may either conclude that the halo is small enough to have 
just formed or that the halo is a spurious detection and should 
be removed. 



Across timesteps, many halo properties are expected to 
change slowly (e.g., Vmax, A/vir, Rvh, and angular momentum) 
or predictably (e.g., position and velocity). These properties 
may then be used to tell whether a given halo has a reason- 
able progenitor in the catalog. To calibrate what is considered 
"reasonable," we use particle-based merger trees to determine 
the accuracy of our predictions for progenitor properties as 
compared to the actual progenitor properties in the trees. The 
characteristic errors in predicting position (tj), velocity (t,,), 
and v'lnax (Tinifli) yield a natural distance metric, t/j^In partic- 
ular, if we denote the expected progenitor properties with a 
subscript e and those of a candidate progenitor by subscript c, 
we have: 



d(e,c) ■■ 



\ 



2t2 



2r2 



2t2 

vmax 



(8) 



This gives a natural ranking for candidate progenitors, as 
well as a natural way to supply a threshold (i.e., maximum 
acceptable value for the metric) for physical consistency. 

Of course, there are many ways of determining characteris- 
tic scales for errors. In our code, we choose to take the sum 
of the average error and the standard deviation of the error for 
position and velocity (e.g., r, = (| Ax|) + CT|a.v|) because this 
yields increased robustness against unusual probability distri- 
butions. Yet as might be expected, in practice, we find that 
both the average error and the standard deviation of the er- 
ror are always within a factor of two of each other. We find 
additionally that and r,, have a dependence on halo mass; 
we account for this by binning halos by 0.25 dex in mass and 
then calculating and r,, separately for each binFj For esti- 
mating Vmax.f in Eq.[8] we can calculate an expecteofractional 
change in v,nax over each timestep as a function of mass. We 
then may take r,,,„„v to be the standard deviation of the loga- 
rithmic change in v^ax across timesteps as a function of mass 

(i.e., Criog,„(A,w)(Mvir)). 

Note that defining the metric in this way has the unique 
advantage that it is not necessary to calculate particle-based 
merger trees for all halos. If the trees are available for even a 
small portion of the simulation, that is sufficient to calculate 
the distance metric (that is, the values of r,, r,,, and T„„a.x) 
which applies to the entire volume. This feature makes our 
algorithm suitable even when particle IDs are not stored for 
all particles or when particle IDs are not consistent across the 
entire volume (as in tiled simulations). 

5.3. Stage One: Fixing Links and Filling in Missing Halos 

As described in the previous section, every correctly- 
identified halo must have a progenitor at the previous 
timestep, except for those halos whose progenitors are below 
the mass-resolution limit of the simulation. As such, we be- 
gin by evolving the halos at one timestep (t„) backwards to 
the previous timestep (f„-i) to predict the properties of the ex- 
pected most-massive progenitor. As explained in the previous 

' We expect Myir and Ryir to be highly correlated with Vmax, except in the 
case of subhalos, where they are harder to predict and consequently yield 
less accessible information than Vmax- Thus, we exclude them from direct 
consideration in our metric. In a future revision of our code, we may add 
support for angular momentum comparisons between halos, but we do not do 
so at present because this property is not yet reported by all halo finders. 

This results in an ambiguity in Eq. Isl — e.g., should one use Ty(My[y g) 
or Ti-(Mvi,f)? Arguments can be made tei' and against both choices, with 
neither one being obviously superior. Computationally, however, it is easier 
to choose (as we do) to use the expected properties (e.g., Tr(Mvii j)). 



Gravitationally Consistent Halo Catalogs and Merger Trees 



7 



Halo Merger Tree Algorithm 

1. Identify halo descendants using a traditional par- 
ticle algorithm. 

2. Gravitationally evolve the positions and velocities 
of all halos at the current timestep back in time to 
identify their most likely positions at the previous 
timestep. 

3. Based on predicted progenitor halos in step (2), 
cut ties to spurious descendants. 

4. Create links for halos with likely progenitors at 
the previous timestep for cases in which step (2) has 
identified a good match. 

5. For halos in the current timestep without likely 
progenitors, create a new halo at the previous 
timestep with position and velocity given by the evo- 
lution in step (2). Remove any such halos generated 
from previous rounds if they have had no real pro- 
genitors for several timesteps. 

6. For halos in the previous timesteps which have 
no descendants, assume that a merger occurred into 
the halo exerting the strongest tidal field across it at 
the previous timestep. If a halo with no descendant 
is too far removed from other halos to experience a 
significant tidal field, assume that it is a statistical 
fluctuation and remove it from the tree and catalogs. 

Fig. 1 . — A visual summary of the first stage of the merger tree algorithm. 



n-l 




section, these predictions in combination with the particle- is the most-massive progenitor of a halo e at timestep f„_i — 
based merger trees allow us to calculate a metric d{e,c) to i.e., that the connection or link between c and e is physically 
evaluate the likelihood that a candidate halo c at timestep f„ reasonable. 
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Once calculation of the metric is complete, we break all po- 
tentially problematic links in the particle-based merger trees. 
These include: 

1. All links where the metric d(e,c) is above some prede- 
termined threshold dtreak- We choose dbreak = 3.2, which 
results in 1-2% of all particle-based links to be broken 
in the Bolshoi simulation. 

2. All links where the progenitor is not the most-massive 
progenitor of the descendant halo. This affects 2-3% 
of all halo links in the Bolshoi simulation. These links 
can be problematic in two cases — either a) the descen- 
dant halo was not identified by the halo finder, or b) the 
descendant halo identified in the particle-based merger 
trees is a host halo containing the actual descendant 
halo. Hence, it is important to consider all options for 
descendants of these halos before concluding that they 
represent tidal mergers into the most massive host. 

3. All links where the most-massive progenitor is beyond 
a predetermined ratio in Mvii or v'max from the descen- 
dant halo. We choose Mvh-.bi-eak = 0.5 deX and V,nax.break 

= 0.15 dex; this affects 2% of all particle-based links in 
the Bolshoi simulation at z = for ROCKSTAR, increas- 
ing to 4% (depending on halo mass) at very high red- 
shifts where the mass accretion rate is higher. In con- 
trast, for BDM, the limited number of particles used to 
determine links (250) results in 10-20% of links at high 
halo mass (M > IQ^^Mq) being clearly spurious in this 
way, with a direct correlation to the timestep length. 
Such links are almost always cases where the most- 
massive progenitor was misidentified or mis-linked in 
the particle-based merger tree — and hence, those halos 
may have different descendants as mentioned in the pre- 
vious item. 

Then, for all halos at ?„ without progenitors, we scan for po- 
tential progenitors among all the halos without descendants at 
f„_i . To do so, we rank potential progenitors by likelihood ac- 
cording to the metric ii(e,c); if any potential progenitors are 
found within a predetermined threshold dmatch, the one with 
the highest likelihood is assigned as the most-massive pro- 
genitor of the halo in question. We set dmatch = 15; for all of 
our tested combinations of halo finders and simulation codes, 
this still restricts progenitors to be well within the virial radius 
of their descendant. 

With BDM, we find that this procedure is not sufficient for 
some outlying cases. In particular, we find that the halo- 
finding algorithm used in EDM has occasional trouble locat- 
ing centers of massive objects. In cases where multiple dense 
peaks are present within an overdense region (as is the case 
with major mergers), EDM may switch between those peaks 
in successive timesteps when it attempts to determine halo 
properties. As such, massive halos may in rare cases (< 3% 
of merger tree links) switch to a new location up to a virial 
radius away for dozens of time steps or more (see also e.g. 
discussion in Wetzel et al.|2009^ . These cases are obviously 
not physical— but even so, it is impossible to call one of the 
centers more "correct" than the other without a careful phase- 
space analysis. Because EDM and many other popular halo 
finders do not find halos in phase space, we have chosen to ex- 
plicitly allow an exception for such cases in our merger trees. 
As such, for those halos which still do not have physically ac- 
ceptable progenitors, we allow a progenitor to be matched at 



the previous timestep if a) it is within the virial radius of the 
descendant halo, and b) if its v^ax is within Vmax.break = 0.15 
dex of the descendant halo. 

Even so, some halos at f„ will still have no progenitors; 
for these halos, two options remain. Either the progenitor is 
missing from the halo catalog at the previous timestep, or the 
progenitor has fallen below the mass-completeness limit of 
the simulation. Determining which option is correct requires 
analysis of earlier timesteps — however, for large simulations 
like Bolshoi, only a few timesteps may be able to fit into mem- 
ory simultaneously. To allow for a more flexible analysis, we 
create a placeholder halo, called a phantom halo, in the halo 
catalog at timestep t„-\ for each halo remaining at t„ without 
a progenitor Phantom halos may be created at several suc- 
cessive timesteps to allow for cases in which the halo finder 
loses track of a halo for multiple timesteps; this latter case 
most often occurs for major mergers. However, to avoid spu- 
rious links between accidentally coincident true and phantom 
halos, we cease tracking phantom halos beyond a predeter- 
mined number of timesteps fy,/,„,„. For Bolshoi, we find that 
tracking phantom halos for up to four timesteps is sufficient 
to patch over the vast majority of cases for missing halos. 

For halos at f„_i which still have no descendants, there are 
also two options. Either they are not the most-massive pro- 
genitor of their descendant (i.e., they are tidal mergers), or 
they are spurious fluctuations in the halo catalogs. We use the 
formula in Eq.lTlto discriminate between these two cases. In 
particular, we find that a tidal acceleration field below |r| = 
0.3-0.4 km s"' Myr"' comoving Mpc"' is a robust indicator 



that a tidal merger is extremely unlikely (see 5 6.2 1. As such, 
halos above that threshold are assigned descendants accord- 
ing to the halo exerting the largest tidal field; halos below that 
threshold are deleted from the catalogs. This method agrees 
in over 95 % of cases with the original particle-based merger 
trees (see 5 6.2 1, the remaining cases being those where sub- 
halos were mcorrectly merged into their host in the particle- 
based trees. 

A graphic summary of the most important steps of this al- 
gorithm is shown in Fig. [1] As compared to the raw particle- 
based merger trees for the BDM halo finder on Bolshoi (which 
used only 250 particles to track mergers) this stage of the al- 
gorithm, 10-20% of links at each timestep need repairs for 
halos with M > IO^^Mq and 5% are repaired for halos with 
M < IQ^^Mq; these are largely halos where the progenitor 
was clearly mis-identified in the particle-based trees. By 
comparison, in the particle-based trees for ROCKSTAR (which 
used all halo particles to track mergers), only 2-3% of links 
at each timestep are changed. 

5.4. Stage Two: Cleaning Up Halo Tracks 

The main effect of the previous stage is to create new halos 
where there are gaps in the halo catalogs. However, compara- 
tively few halos are removed — only those which have no ob- 
vious descendant at the next timestep. Thus far, we have only 
been concerned with the physical validity of individual halos, 
as opposed to full halo tracks — that is to say, the lineage of 
most-massive progenitors for a given halo. Checking the va- 
lidity of halo tracks extends the phase-and-mass-space checks 
of the previous section with temporal checks — e.g., it allows 
us to remove halos if they only appear for a few timesteps in 
the catalogs. To complete the removal of spuriously-detected 
halos, we detect and remove three types of problematic halo 
tracks; 
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1 . Halos whose lineage of most-massive progenitors con- 
tains more than a fraction fpham of phantom halos. 
Halos with too many phantom progenitors are usually 
those which are on the bare threshold of detection. For 
more massive halos which meet this criterion, it is ex- 
tremely unlikely that they represent valid detections. 
We choose fpham = 25%, which removes about 0.1% 
of halos at all timesteps in the Bolshoi simulation for 
ROCKSTAR and 0.3% of halos at all timesteps for BDM. 

2. Halos which are tracked for fewer than tt, -acted 
timesteps. Again, for massive halos, it is extremely un- 
likely that they represent valid detections if they are not 
tracked beyond a few timesteps. However, for smaller 
halos and earlier redshifts, setting this threshold too low 
can lead to removal of legitimate halo tracks. We find 
that a value of t, racked = 5 represents a good compromise, 
removing 0.2-0.5% of all halos across all timesteps in 
Bolshoi for both halo finders without adversely affect- 
ing halo accretion at early times. 

3. Subhalos whose tracks do not extend outside of the 
virial radius of their host and which are tracked for 
fewer than t, racked, subs timesteps. Ordinarily, one might 
expect all such halos to be spurious, but it is conceivable 
that a halo might form outside the virial radius of a host 
and be accreted in between timesteps if the timesteps 
are sufficiently far apart. We find that a threshold of 
(tracked, subs = 10 removcs an additional 0.1-0.4% of ha- 
los at all timesteps in the Bolshoi simulation for BDM 
and 0.1% for ROCKSTAR. 

We summarize all the parameters used in our algorithm in 
Table[T] While the number of algorithm parameters may seem 
large at first glance, this is a reflection of the large number 
of physical sanity tests which our approach enables in terms 
of excluding unphysical halos. The fact that some halos fail 
each of the sanity tests for reasonable fiducial values of the 
parameters suggests that (at least with current halo finders) the 
integrity of the merger trees would otherwise be compromised 
by unphysical halo tracks. 

6. TESTS OF THE METHOD 

6.1. Tests of Gravitational Evolution 

Despite the complexity of the underlying particle distri- 
bution for dark matter halos, the approximation of spher- 
ical halos with NFW profiles appears to work remarkably 
well. Figurej2]shows a comparison of the predicted progenitor 
halo properties with the actual progenitor properties (as deter- 
mined from particle merger trees). Figurel2]demonstrates that 
the gravitational evolution code can evolve halos from one 
timestep to another with tightly-controlled positional errors 
on the order of the force resolution (1 kpc /;"') and velocity 
errors ranging from 2-50 km s"' , depending on halo mass, for 
Bolshoi with the ROCKSTAR halo finderp]For EDM, similar 
results are obtained albeit with higher positional and velocity 
errors (by a factor of 2-3, except for the positions of halos 
with M > W^Mq), which are detailed in Appendix [a] with 
positional errors on the order of the force resolution; however 
BDM has higher velocity errors by a factor of 2-3, indepen- 
dent of halo mass. In the Consuelo simulation, we find similar 
results; see Appendix [B] for details. 

" See Appendix C for a comparison witli the SUBFIND halo finder 




Fig. 2. — Conditional density plots of the errors between the expected po- 
sitions and velocities of halos (obtained via gravitational evolution from the 
subsequent timestep) and the actual values in the halo catalog as a function 
of halo mass. These plots show the errors for one timestep lasting 42 Myr at 
z = 0, using the Bolshoi simulation and the ROCKSTAR halo finder. Superim- 
posed over the density plots are red lines showing the linear average of the 
errors; blue dashed line in the positional error plot (top) indicates the force 
resolution. The bifurcation in positional eiTors at low masses is an artifact of 
the simulation code (ART), as both ROCKSTAR and BDM have similar features 
on Bolshoi, whereas the Consuelo simulation (Fig. [9} does not have such a 
feature. The linear average is significantly offset from the median for velocity 
errors due to a long tail in the error distribution. 

Figure l3] shows the scaling of position and velocity errors 
with redsnift for Bolshoi with the ROCKSTAR halo finder. In 
Bolshoi, the timestep outputs are not equally spaced; they are 
between 80 -92 Myr befor e a = 0.8 1 (and 40-46 Myr after that 
scale (Klypi n et al.|201l] l). For massive halos, positional er- 
rors scale most predominantly as a function of timestep length 
(suggesting velocity inconsistencies as the most likely rea- 
son), whereas for smaller halos, positional errors are relatively 
fixed close to the force resolution; these errors result in pre- 
dicted halo progenitor locations that are well within the virial 
radius of their actual progenitors across all masses. 

Velocity errors for ROCKSTAR are comparatively indepen- 
dent of timestep length, and are on the order of 2-50 km s"' at 
z = — that is, fractional errors on the order of 0.5-5% in halo 
peculiar velocities — depending on the mass of the halo. How- 
ever, they rise substantially at higher redshifts when major 
mergers are more common. For BDM, the velocity errors are 



10 



BEHROOZI ET AL 



TABLE 1 

Summary of algorithm parameters 



Variable 



Chosen Value 



Description 



Section 



Threshold for breaking links in the particle-based merger trees according to the distance metric. 
Threshold for considering a proposed link acceptable according to the distance metric. 
Threshold for breaking most-massive progenitor links if the mass ratio of progenitor and de- 
scendant exceeds this value. 

Threshold for breaking most-massive progenitor links if the Vn,ax ratio of progenitor and de- 
scendant exceeds this value. 

Threshold for the tidal field to consider a tidal merger physically acceptable. 

Threshold of acceptability for the fraction of time a halo track may contain phantom halos. 

Number of timesteps to calculate expected phantom halo locations. 

Minimum number of timesteps a halo can exist in the catalogs in order to be considered 
physical. 

Minimum number of timesteps a subhalo can exist in the catalogs in order to be considered 
physical. 



^byeak 
^match 

vir,break 



^mQX,hreak 

atidal 
fphant 
^phant 

h racked 
^trackedsubs 



0.4 km s 



3.2 
15 

0.5 dex 



0.15 dex 



Myr ' cmvg Mpc 
25% 
4 
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Fig. 3. — Comparison of errors at different timesteps for the ROCKSTAR 
halo finder on Bolshoi. Positional errors (in real distance, as opposed to co- 
moving distance) appear to be mostly independent of redshift, whereas ve- 
locity errors do not. The velocity errors at later redshifts are reduced due to 
the reduced merger frequency. For the most massive halos (M ~ IO'^^Mq), 
there is a clear break in the positional errors at a ~ 0.8; this is because the 
timestep length doubles for a < 0.8, which suggests that velocity errors are 
largely responsible for the resulting positional errors at these masses. 

elevated (2-100 km s"') independent of redshift, by a factor 
of 1-5 as compared to ROCKSTAR (see Appendix [A]!. These 
errors suggest either a potential weakness in calculating halo 



Fig. 4. — A conditional density plot of the tidal force acting on ROCKSTAR 
halos in Bolshoi at z = 0.01 . The grey density plot shows the tidal force acting 
on most-massive progenitors (i.e., halos which do not tidally merge), and the 
red density/size plot shows the tidal force acting on tidally merging halos (as 
identified in the particle-based merger trees). The tidal force is expressed in 
terms of differential acceleration (km s"' Myr"' ) per unit distance (comoving 
Mpc); the dotted line represents our classification threshold for physically- 
acceptable tidal mergers. 

velocities for BDM or a better ability for ROCKSTAR to re- 
cover halo velocities due to its use of additional phase-space 
information. 

6.2. Tests of Tidal Force Calculations 

Figure [4] shows the maximum tidal fields as calculated 
for tidally merging halos and non-tidally-merging halos (i.e., 
most-massive progenitors). While it is clear that tidal fields 
above a certain threshold do not necessarily imply a merger, 
there is a clear threshold in the tidal field below which a 
merger is very unlikely to happen. However, this is entirely 
sufficient for our algorithm to function — as the halos at the 
next timestep are used to determine the halos at the current 
timestep which are most-massive progenitors, the remaining 
halos without descendants at the current timestep must either 
be tidal mergers or statistical fluctuations, and thus a cut on 
the magnitude of the tidal field is all that is necessary to dis- 
tinguish them. 

In terms of our ability to predict the target for tidally merg- 
ing halos (i.e., the halo into which the merging halo dissi- 
pates), choosing the halo which exerts the strongest tidal field 
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Fig. 5. — Fraction of merging halos for whicli the merger target calculated 
via the tidal force method matches the merger target in the particle-based halo 
merger trees for Bolshoi, using the ROCKSTAR halo finder. 



sistance to numerical issues; for example, we track 90% of 
Vmax ^ 200 km s"' halos to z = 6.2, whereas Klypin et al. 
(2011 1 tracks the same fraction of v^ax ^ 200 km s"' halos 



only to z = 1 .0. 
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Fig. 6. — Fraction of Bolshoi halos at z = tracked to a given redshift for 
the ROC KSTAR halo finder, a s a function of Vmax. Compare to the analogous 
figure in |Klypinet"ar]j2011^ . 

on the halo in question gives results which are in excellent 
agreement with particle-based merger trees, as shown in Fig- 
ure g] 

6.3. Tests of Halo Tracking 

One of the most important tests of any merger tree is the 
degree to which it correctly follows halo progenitors back in 
time. We have imposed stringent cuts on the physicality of 
halo links, but it is important to show that these cuts do not 
truncate otherwise correct halo tracks in the merger trees. We 
demonstrate our ability to track halos back to early redshifts in 
Figure[6] Clearly, halo tracks will end when the most-massive 
progenitor of a halo falls below the resolution limit of the 
simulation. We recover similar tracking statistics at the 50% 
threshold a s compa red to the advanced particle-based trees in 
Klypin et ar| ( |201 1|) (e.g., 50% of Vmax 'y 200 km s"' halos are 
tracked to z = 7 in Klypin et al.||201 1" whereas we track the 
same fraction halos to z = 8.3) — ^these fractions correspond to 
the expected mass accretion histories of such halos. However, 
as compared to Klypin et al._(201 1) , we have vastly higher re- 
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Fig. 7. — The fractional changes in the mass function for Bolshoi with 
the ROCKSTAR halo finder for phantom halos added (upper panel) and halos 
removed (lower panel). The lower panel excludes phantom halos which were 
added and later removed; as such, it represents a consistency check for only 
the halos returned by the halo finder. The number of halos added by our 
consistency algorithm is roughly equal to the number of halos deleted, and 
both are small in comparison to both the total number of halos and the number 
of satellites across all masses. Towards the halo resolution limit, the number 
of spurious halos and the number of phantom halos necessary to fill in gaps 
in the merger tracks increases. 



6.4. Effects on the Halo Mass Function 

With so many ways to add and delete halos from the merger 
tree and catalogs, it is important to check the effects on the 
overall halo mass function. In the case of ROCKSTAR on Bol- 
shoi, the number of added halos (phantoms) and deleted (in- 
consistent) halos is less than 1% for z < 1 compared to the 
total mass function; additionally, the number of added and 
deleted halos are comparable across a wide range of halo 
masses, as shown in Figure |7] so that the net effect on the 
overall mass function is almost negligible. EDM performs 
similarly well; the number of deleted halos is in the 1-2% 
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Fig. 8. — The fractional changes in the mass function for Bolshoi with 
the BDM halo finder for halos removed. As in Fig. |7] this figure excludes 
phantom halos which were added and later removed; as such, it represents a 
consistency check for only the halos returned by BDM. As with ROCKSTAR, 
the number of halos requiring removal is small compared to both the overall 
mass function and the number of satellites. 

range (as shown in Fig. |8]l, with also 1-2% added phantom 
halos. At high masses (M ^ lO'^^M©), the current version 
of ROCKSTAR suffers from a minor bug related to how ha- 
los are stitched together across processor domain boundaries 
and so has slightly worse consistency than BDM; nonetheless, 
the merger tree algorithm is able to repair all affected halos. 
At lower masses, the ability of ROCKSTAR to determine halo 
properties in phase space results in increased consistency as 
compared to EDM. 

Interpreted in a different way, these numbers imply that 
the raw halo catalogs from ROCKSTAR are internally self- 
consistent at the 1% level at z = 0, and similarly for BDM. 
As discussed in Appendix Ib] the results are very similar for 
the Consuelo simulation, with the ROCKSTAR halo finder be- 
ing self-consistent at the 0.5-1% level across all halo masses. 
While these numbers cannot be directly interpreted as the ac- 
curacy of the mass function, they directly represent the preci- 
sion with which each halo finder can recover the mass func- 
tion. Nonetheless, the precision does indirectly set a limit for 
the accuracy of the halo finder at a given timestep; it is worth 
noting that the precision of both halo finders is significantly 
better than the current 5% statistical uncertainties in the halo 
mass function ( Tinker et al.|2008) l. 

We remark that, while the current implementation of our al- 
gorithm only allows us to directly test the consistency of posi- 
tions and velocities, it is in fact possible to use our algorithm 
to test the accuracy of halo masses and subhalo masses and 
correct for systematic bias therein. Namely, one can either 
search directly for systematic alignments of velocity errors 
(i.e., differences between predicted and actual halo velocities) 
towards nearby halos, or one can simply adopt a parametriza- 
tion for the systematic bias in halo masses and search for a 
fit which minimizes the velocity errors. We reserve this topic 
for a future paper, however, as a careful calibration and un- 
derstanding of the gravitational force formula is necessary to 
confirm that the acceleration calculations do not suffer from 
more systematics than the halo mass calculations. 

7. CONCLUSIONS AND DISCUSSION 



We have developed a new algorithm for creating halo 
merger trees which explicitly ensures dynamical consistency 
of halo properties across timesteps. Our method has several 
advantages which when combined provide excellent robust- 
ness and tracking of both halos and subhalos: 

1 . The ability to more accurately track halos than particle- 
based merger trees. 

2. The ability to explicitly evaluate the precision with 
which the halo finder can recover halo positions and 
velocities. 

3. The ability to correct for halo finder incompleteness by 
adding halos with gravitationally consistent properties 
to the halo catalogs. 

4. The ability to correct for incorrect tidal mergers in 
particle-based trees. 

5. The ability to construct merger trees even in the ab- 
sence of full particle tracking; in addition, the ability to 
construct merger trees even when particle-based merger 
trees are available only for a small region of the simu- 
lation. 

6. The ability to remove halos which fail any of a large 
number of sanity tests (gravitational inconsistency, tidal 
inconsistency, tracking inconsistencies) to increase the 
purity of the resulting merger trees. In addition, the 
ability to uncover previously unknown problems in halo 
finders. 

7. The ability to explicitly evaluate the self-consistency of 
the mass function returned by the halo finder In the 
future, the ability to explicitly evaluate the accuracy of 
the mass function returned by the halo finder. 

The code used is publicly available at 
http : //code . google . com/p/ consistent-tree si 
This algorithm has been used to create merger trees for two 
simulations, the Bolshoi simulation (2048-' particles in a 250 
/!"'Mpc box) and the Consuelo simulation (WOO^* particles in 
a 420 /!"'Mpc box), and have additionally compared two halo 
finders (EDM and ROCKSTAR) on the Bolshoi simulation. 
The halo finders perform similarly for many instances, and 
self-consistency is problematic only at the 1-2% level for 
both. 

The defining feature of our algorithm is the ability to pre- 
dict the evolution of halo locations, velocities, and properties. 
This ability applies within timesteps as well, which has rele- 
vance for semi-analytical models (of, e.g., reionization) which 
may require information about halo properties at many more 
timesteps than are storable from the simulation. Our approach 
allows for effectively infinite timestep resolution in terms of 
individual halo properties; in addition, because tidal forces 
are calculated in each step, it becomes possible to estimate the 
timing of halo mergers and thereby recover all the information 
which is lost by saving fewer snapshots from a simulation. 

The merger trees and halo catalogs thus generated are use- 
ful not only for improved cosmological predictions from sim- 
ulations, but because of the improved tracking performance, 
they are also useful for precision predictions of a wide range 
of observables related to merger trees: the galaxy-galaxy 
merger rate, the dynamical friction timescale of subhalos. 
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halo mass accretion histories (especially as functions of en- 
vironment and assembly time), and semi-analytical / semi- 
empirical models of galaxy formation; several studies explor- 
ing these predictions are already in progress. 
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APPENDIX 

BDM POSITION / VELOCITY TRACKING AS COMPARED TO ROCKSTAR 

Figure|9]shows position and velocity tracking errors for BDM as compared to ROCKSTAR. For low-mass halos (M < 10"Mq), 
ROCKSTAR performs ideally, almost at the force resolution of the simulation. In comparison, BDM gives positions accurate to 
within a factor of a few of the force resolution (2-3). For high-mass halos (M > IO'^Mq), both halo finders perform similarly. 
Because of its ability to find halos in phase space, ROCKSTAR is always better able to recover halo velocities than BDM; the 
resulting velocity errors are smaller by a factor of 1-5. 

CONSUELO 



For comparison with Bolshoi, we include results for the Consuelo simulation analogous to Fig.[3]in Fig. 10 and figures analo- 
gous to Figs. [6]|7] and[8]in Fig. 1 1 We find identical results for the Consuelo simulation as compared to the Bolshoi simulation, 
with only a few exceptions. The main difference in terms of the more Umited mass and force resolution of Consuelo means that 
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Fig. 9. — Comparison of position and velocity errors at z = for the ROCKSTAR and BDM halo finders on Bolshoi, similar to Fig. [2] 
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Fig. 10. — Comparison of errors at different timesteps for the ROCKSTAR halo finder on Consuelo, analogous to Fig.|3] 



the positional errors for recovered halos are higher (Fig. 10 1 and that halos with maximum circular velocities of 50-100 km s ' 
are no longer above the resolution limit (and hence are excluded from Fig. 111. In addition, the timesteps are logarithmically 



spaced in scale factor, meaning that at earlier times, the timesteps are more closely spaced. This contributes to the positional 
errors becoming smaller with increasing redshift for Consuelo (as contrasted with them becoming larger with increasing redshift 
in Bolshoi). Finally, on account of like ly on account of the larger analysis volume per processor resulting in fewer errors from 
the boundary effect issue mentioned in ^6.4| the self-consistency of the ROCKSTAR halo finder on Consuelo is slightly better than 
that for Bolshoi. In Consuelo, it is always at the 1% level or less for all halo masses, whereas it is sometimes only at the 2% level 
for high-mass (M > IO'^^Mq) halos in Bolshoi. 

SUBFIND 



We have additionally used our algorithm on the SUBFIND halo finder ( Springel et al.|[200T l on a small subvolume (50 Mpc 
/i"' on a side) of Bolshoi. Because the algorithm described in this paper requires halo masses even for subhalos, and because 
SUBFIND by default only returns particle membership for subhalos, a spherical overdensity mass calculator was used on the halo 
centers returned by SUBFIND to calculate the unbound mass within 7?vir for central halos as well as Vn,ax and Ry^^.^^ for all halos; 
these latter two properties were used to estimate subhalo mass. Because the mass derived this way is inconsistent with the mass 
for central halos, we do not show results for halo tracking or self-consistency, which would unfairly penalize SUBFIND. 

Nonetheless, it is possible to calculate the position / velocity precision for halos returned by SUBFIND, as the number of halos 
affected by inconsistent satellite masses across a single timestep is small. We show results analogous to Fig.|3]in Fig. 12 We 
find that SUBFIND appears to have exquisite velocity precision at the expense of some position precision (2-5 times worse than 
ROCKSTAR; see Fig. [12]). SUBFlND's velocity precision does not necessarily translate into velocity accuracy, however. Indeed, 
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Fig. 1 1 . — Left panel: Fraction of Consuelo halos at z = tracked to a given redsliift for the ROCKSTAR halo finder, as a function of Vmax, analogous to Fig. [6] 
Right panel: Fraction of halos found by the ROCKSTAR halo finder which were removed in the process of physical consistency checking as a function of mass 
and redshift, analogous to Figs.|7]and[8] 




Fig. 12. — Comparison of en'ors at different timesteps for the SUBFIND halo finder on a subregion of Bolshoi, analogous to Fig.|3] 



SUBFIND averages particle positions to yield a bulk halo velocity, but as demonstrated in Behroozi et al.| ( [20TT l, the difference 
between the halo core velocity (which would correspond more closely to the central galaxy velocity) and the halo bulk velocity 
can be on the order of 20% of the halo velocity dispersion, up to 400 km s"' for the largest clusters at z = 0. Hence, while 
SUBFIND's velocities are remarkably consistent, they do not necessarily correspond to observable properties. 



